D252-D260 Nucleic Acids Research, 2012, Vol. 40, Database issue 
doi:10.1093/nar/gkrll89 



Published online 6 December 2011 



Minimotif Miner 3.0: database expansion and 
significantly improved reduction of false-positive 
predictions from consensus sequences 

Tian Mi 1 , Jerlin Camilus Merlin 1 , Sandeep Deverasetty 2 , Michael R. Gryk 3 , 
Travis J. Bill 2 , Andrew W. Brooks 2 , Logan Y. Lee 2 , Viraj Rathnayake 2 , 
Christian A. Ross 2 , David P. Sargeant 2 , Christy L. Strong 2 , Paula Watts 2 , 
Sanguthevar Rajasekaran 1 * and Martin R. Schiller 2 '* 

department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-2155, 
2 School of Life Sciences, University of Nevada Las Vegas, 4505 Maryland Pkwy., Las Vegas, NV 89154-4004 
and 3 Department of Molecular, Microbial, and Structural Biology, University of Connecticut Health Center, 
263 Farmington Ave., Farmington, CT 06030-3305, USA 

Received September 14, 2011; Revised November 14, 2011; Accepted November 15, 2011 



ABSTRACT 

Minimotif Miner (MnM available at http:// 
minimotifminer.org or http://mnm.engr.uconn.edu) 
is an online database for identifying new minimotifs 
in protein queries. Minimotifs are short contiguous 
peptide sequences that have a known function in at 
least one protein. Here we report the third release of 
the MnM database which has now grown 60-fold to 
approximately 300000 minimotifs. Since short 
minimotifs are by their nature not very complex we 
also summarize a new set of false-positive filters 
and linear regression scoring that vastly enhance 
minimotif prediction accuracy on a test data set. 
This online database can be used to predict new 
functions in proteins and causes of disease. 

INTRODUCTION 

A common theme in protein activity regulation is the 
binding of a structural domain of one protein to a short, 
contiguous peptide segment of another. From a bioinfor- 
matics perspective, identifying domain signatures has been 
incredibly useful in formulating hypotheses regarding the 
biological function of otherwise uncharacterized proteins. 
The success of such methods is due in part to the high 
sequence complexity of these relatively large domains 
(approximately 100 residues in length), as well as their 
common evolutionary heritage, which allow for 



high-confidence domain identification with few false posi- 
tives. The short, contiguous segments [termed minimotifs 
or short linear motifs (SLiMs)] are just as useful in iden- 
tifying the roles of proteins, but are more difficult to 
identify with high accuracy. Nevertheless, several bioinfor- 
matics resources exist for querying protein sequences for 
the existence of minimotifs, including Minimotif Miner 
(MnM), the Eukaryotic Linear Motif (ELM) resource 
and other specialized databases (1-10). It remains an 
ongoing pursuit to increase both the sensitivity and 
accuracy of minimotif prediction in proteins. 

This article summarizes the latest release and develop- 
ments of the MnM database and webserver, version 3.0; 
additional details can be found in the new MnM user 
guide on the MnM website. Efforts since our last release 
in 2008 (4) have concentrated on two fronts: improved 
filters which increase the accuracy of minimotif prediction 
by removing false positives (1 1-13), and increasing the size 
of the MnM database through both manual annotation of 
minimotifs from the literature and federation with other 
databases including PhosphoSite, DOMINO, MEROPS, 
UniProt, PepX, 3DID, PeptiDB and HPRD (14-21). 
MnM 3 now includes a total of 294933 minimotif defin- 
itions, consisting of 880 consensus minimotifs and 294 053 
instances. These minimotifs span three biological 
activities: trafficking, binding and modifying. Multiple 
filters have been introduced since our 2008 release of 
MnM 2, the most important being a combined filtering 
approach that can result in 90% accuracy of minimotif 
prediction with few false positives using one scoring 
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threshold or even 38% identification of minimotifs 
with no false positives with a more stringent threshold 
(submitted for publication). The score used by this 
combined filter is now used as the default ranking of the 
minimotif list, rather than the frequency score used in 
MnM 2. 

Besides their role in recognition by protein domains, 
minimotifs have a number of important biological roles. 
In addition to binding, minimotifs are often determinants 
for post-translational modifications and trafficking 
proteins to specific parts of cells. Minimotifs are also 
involved in cell signaling and regulation (22,23). A 
number of minimotifs are mutated in different disease 
and pathogens such as viruses tend to exploit host machin- 
ery by viral encoded minimotifs (24-26). Due to their role 
in disease, the actions of several drugs are based on a 
minimotif-mimetic mechanism (27,28). 

RESULTS 

Revised minimotif model and new entries in Minimotif 
Miner 3.0 

Prior to adding minimotifs to the MnM 2 database, we 
first reevaluated our previous model, which presented 22 
attributes of a minimotif (12). We have now revised this 
model to include 28 attributes as shown in Figure 1 . This 
model contains a protein sequence definition and a func- 
tional definition where the sequence definition describes 
the chemistry of the motif. The sequence definition can 
be an instance or a consensus sequence. Instances are 
the exact amino acid sequence found in the protein that 
contains the minimotif; whereas, a consensus sequence is 



an interpretation of a set of instances that indicates 
degeneracies at certain positions in the amino acid 
sequence. The consensus sequence definition format is 
largely based on that previously proposed by the Seefeld 
Convention and later modified for MnM (12,29). These 
modifications include an extensible expanded definition of 
the covalent chemistry of the minimotif containing the 
position within the protein, any modified residues and 
their position in the sequence, and a description of any 
post-translational modifications of amino acids in the 
sequence and corresponding accession numbers from the 
Psi-Mod database (30). 

The functional component of the minimotif model is 
centered around a syntactical triplet where the motif 
source is the subject, the activity is the verb and the 
target that engages the minimotif is the object. There are 
unique properties to this triplet such as an affinity, struc- 
ture, minimotif reference, database reference for cross 
referencing external databases and experimental evidence 
that support the minimotif. 

The motif source, activity and target have a number of 
attributes previously modeled, but here we have renamed 
the 'required modification 1 to 'motif modification' to 
better distinguish this attribute from 'activity modifica- 
tion'. Motif modification is when a motif needs to be co- 
valently modified to engage the target such as when a 
minimotif must be phosphorylated to bind 14-3-3. 
Whereas, an activity modification describes a situation 
where the target is an enzyme that covalently modifies 
the minimotif as when a minimotif becomes 
myristoylated. The description of these modifications 
requires more detail than in our original model. To accur- 
ately describe these modifications the new model includes 
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Figure 1. Revised minimotif model. The key elements of the minimotif syntax are colored blue. Orange boxes indicate attributes that are unique 
to specific minimotif triplets. Yellow ovals are for different attributes of minimotif triplet elements. All attributes except those in the purple 
boxes were previously described in our minimotif model and the purple boxes are new attributes to define motif modifications and activity modi- 
fications (12). 
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Table 1. Growth of minimotif entries in MnM 



Category 
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MnM 2 


MnM 3 


Total 








1VHJ111 
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294 933 


Consensus sequences 


312 
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Instance sequences 
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294 053 


Post-translational modifications 


1 16 
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Motif sequences 
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2224 
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Motif proteins 
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49671 


Motif targets 


<312 


687 


2620 



the 'residue 1 that is modified, the sequence 'position 
number' of the sequence, the 'type' of modification and 
the 'type code' number of the modification, which for the 
most part makes use of accession ids in the Psi-Mod 
database (30). 

Since the release of MnM 2, the total number of 
minimotif sequences has increased from about 5300 to 
almost 300000 (Table 1) (4). The majority of these new 
entries have been gleaned from federation with other open 
databases and some were manually annotated from the 
primary literature using the MimoSA annotation helper 
application designed for this task (31). All minimotif 
entries are annotated using our revised minimotif syntax 
model (Figure 1) (12). The majority of growth comes from 
addition of instances. We have focused on instances 
because the Minimotif Miner query engine can be used 
to generate consensus sequence from any set of instances 
(12). Most of the minimotifs added from external sources 
are for post-translation modifications. Minimotifs are 
found in approximately 50000 different proteins in 
many different species; most minimotifs are for mamma- 
lian organisms, although MnM does contain some bacter- 
ial, yeast and invertebrate minimotifs as well. The number 
of domains that interact with or associate with minimotifs 
in MnM is approximately 2600 suggesting that there are 
still many minimotifs yet to be discovered. 

There is a minimal set of attributes necessary to define a 
minimotif for entry into MnM. The minimotif sequence 
types are classified as either an instance or a consensus 
sequence, which each have a minimal set of attributes. 
Consensus sequence definitions must have an amino acid 
sequence less than 15 residues long, activity and 
subactivity, literature reference, one or more experimental 
techniques in support of the minimotif, and annotation of 
any post-translational modifications to the minimotif 
sequence (residue modified, position in sequence, type of 
modification and Psi-Mod id for the modification, if avail- 
able). Instances contain this attribute set, but also must 
have a name of the sequence harboring the minimotif, 
whether the source protein is a peptide fragment or a 
protein, and if a protein, must have an accession 
number to one of the available protein databases. While 
we prefer to have information about the target molecule 



that is associated with the minimotif, this is not required in 
the minimal set because there is value for such database 
entries in that this information can be used to identify 
unknown targets by mining-based approaches. For 
example, an instance of a phosphorylation site on a 
protein substrate can be used with kinase consensus 
sequences in the database to predict the target kinase. 
The 28 attributes of minimotifs are stored in a MySQL 
database. For the approximately 6000 manually 
annotated minimotifs, all 28 attributes were entered, 
except in the cases where information was not available 
from the literature. For example, some minimotifs do not 
have structures or affinities. We note that many of the 
minimotifs imported from external databases have the 
minimal set of information required to define a minimotif, 
but are often missing many of the other attributes defined 
in our model; we only imported minimotifs that have the 
minimal set of attributes. 

Minimotif filtering to reduce false-positive predictions 

The major difficulty in identifying functional minimotifs 
within a protein sequence of interest is the high 
false-positive rate — that is, a large number of predicted 
minimotifs do not perform the predicted biological 
function, but coincidentally share the minimotif 
sequence signature present in other biologically active 
proteins. These false-positive predictions are notoriously 
difficult to filter out based on sequence definitions alone, 
due to the inherently low-sequence complexity of 
minimotifs (7,32). However, additional context informa- 
tion (beyond amino acid sequence) can be used to narrow 
the search and effectively filter these false positives, 
increasing the accuracy of minimotif prediction (11,13). 
Such context information is routinely employed by indi- 
vidual researchers when evaluating minimotif prediction 
results. For instance, a researcher studying nuclear 
import in mouse neurons would quickly discard motif pre- 
dictions regulating bacterial cell division. In this case, the 
researcher would be imparting context-specific informa- 
tion about molecular function, cellular function and 
taxonomy to rule out an obvious false positive. While ef- 
fective, such a filtering technique is highly inefficient both 
in the time it takes an individual to prune the results list, 
as well as in the breadth of understanding required to 
effectively filter all false positives. Over the past 2 years, 
several contextual filters have been added to the MnM 
web service, which have been demonstrated to be highly 
effective in improving the accuracy of minimotif predic- 
tion thereby increasing the ease of interpretation of the 
Minimotif Miner results (11,13). 

The original implementation of MnM 1.0 did not 
attempt to filter any false positives, but ranked minimotif 
predictions in descending order of sequence complexity. A 
scoring metric for location of a minimotif on a protein 
surface, and evolutionary conservation among divergent 
species was also provided. MnM 2 allowed the user to 
filter the results list based on particular minimotif 
activities of interest and also separated minimotif in- 
stances from consensus definitions. Neither of these 
functionalities formally removes false positives, they 
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simply assist the user in filtering based on his/her own 
knowledge. 

Our first step toward knowledge-based filtering was to 
model minimotif definitions in a richer way — formally 
modeling the source and target proteins (12). Tying the 
source/target protein information to the minimotif defin- 
ition provides a relationship to the taxonomy of the 
observed activity, a relationship to other related species 
through homology databases/searches, a relationship to 
molecular and cellular function through the source/ 
target annotation and the use of the Gene Ontology def- 
initions, and a relationship to other proteins in the same 
biological pathway through protein-protein interaction 
databases (21,33-37). In this manner, the use of 
context-specific definitions can be applied to filter 
false-positives computationally, removing that burden 
from the user. 

Molecular/cellular function. Knowing that the source 
protein and target interact directly is highly helpful in 
identifying true positives in a minimotif search, as 
minimotif activities require an interaction between the 
source and target. In the absence of such direct inter- 
action, filtering for source/target pairs, which are active 
in the same molecular/cellular pathway can also be 
useful. Functions of source/target pairs are accessed 
from the Gene Ontology database, which allows for filter- 
ing based on the molecular/cellular function of the source/ 
target pairs (38). This filtering technique can be used to 
restrict results to only source/target pairs, which share a 
common function, or can be extended to source/target 
pairs that share a related function. Three thresholds are 
provided in MnM 2.1 for varying the relatedness of the 
functions to be filtered (13). The best performing cellular 
function filter is estimated to result in ~26% sensitivity 
with 6% selectivity for a combined discrimination ratio of 
4.6 whereas the best performing molecular function filter 
has a discrimination ratio of 2.9 (Table 2). Sensitivity is 
the percentage of true positives that are not filtered out, 
whereas selectivity is the percentage of true negatives that 
are not removed by the filter (11,13). The discrimination 
ratio is sensitivity/selectivity. 

Protein-protein interactions. MnM 2.2 allowed the user to 
filter results based on known protein-protein interaction 
(PPI) networks (11). The logic behind this filter is that 
minimotif predictions are filtered on the basis of 



experimental verification of the interaction between 
source and target proteins. MnM makes use of six 
external databases containing more than 300000 non- 
redundant PPIs: DiP, Entrez Gene, HPRD, MINT, 
VirusMINT and IntAct (21,33-35,37,39). In the most 
stringent use of the PPI filter, only exact matches 
between source and target are reported, and the predicted 
minimotif represents a hypothetical mode of interaction 
for the known PPI. While effective, this stringent filter is 
limited due to the relatively small number of established 
PPIs. For this reason, the user can extend the filter to 
include homologous proteins for both the known source 
and known target of the PPI. This can be done in one of 
two ways: one, by accessing homologous protein clusters 
via the HomoloGene database; two, by using BLAST 
similarity searches to predict homologous proteins not 
included in HomoloGene (39). Ten default BLAST thresh- 
olds are provided in the motif filtering dialog box (Figure 
2) accounting for a total of 12 possible PPI filtering 
choices. The base-level PPI filter is estimated to result in 
~62% sensitivity with 2% selectivity for a combined dis- 
crimination ratio of 29; this is the best performing filter 
and significantly reduces false positives (Table 2). 

Genetic interactions. A genetic interaction (GI) helps to 
identify that there is a functional relationship between 
two proteins. In some cases, this can be due to direct 
interactions or modifications of one protein by another. 
If a minimotif source and predicted target protein have a 
GI, this prediction can provide a mechanistic explanation 
for the observed relationship. Since the two proteins of a 
GI have this relationship, these proteins are more likely to 
have a minimotif than two unrelated proteins. This 
concept was implemented in three different GI filters on 
MnM 2.3 (submitted for publication). The basic GI filter 
identifies those motif/target pairs where there is a known 
GI and was the GI filter with the highest accuracy; the 
Gl-node based filter extends the GIs for the sources and 
target an additional interaction away to a path length of 2; 
the GI-HomoloGene filter takes advantage of orthologous 
GIs. The basic GI filter had a discrimination ratio of 7.3, 
which was better than the Gl-node and GI-HomoloGene 
filters with ratios of 4.5 and 2, respectively. The primary 
difference comes from a poorer selectivity in removing 
true negatives (~3% versus ~12%); similar sensitivities 
of 21% and 24% were observed for these filters. The 
basic GI filter also had a better discrimination ratio than 



Table 2. Comparison of different minimotif filters 
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Figure 2. Screenshot of minimotif filter selection page. Screenshot of MnM 3 filter section for choosing approaches for filtering out false-positive 
minimotifs. 



the cellular and molecular function niters, but was not as 
high as the PPI filter (Table 2). 

Combined filter approach. When examining the effective- 
ness of the cellular / molecular function filters, we dis- 
covered that combining two filters provided greater 



accuracy in minimotif prediction than either filter alone 
(13). We have recently extended this idea by training a 
linear combination of all the filters to maximize both the 
accuracy and specificity in minimotif search. Using this 
approach with one threshold, the resulting combined 
filter allows us to increase accuracy to 90%, while a 
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more stringent threshold retained 39% of the true posi- 
tives while rejecting all false positives in a large test data 
set. The elimination of all false positives represents an 
important milestone in minimotif prediction. MnM 3 
provides access to all minimotif filters and now ranks all 
minimotif predictions using this new combined filter score. 
The results can also be filtered according to either the 
threshold for maximizing accuracy or for maximum strin- 
gency in removing all false positives (Figure 2). In the 
minimotif results table, minimotifs with scores above the 
threshold of 0.91 are highlighted green (produce no false 
positives on a test data set), between 0.24 and 0.90 are 
highlighted yellow (produce high recovery of true 
minimotifs with only 2% false positives on a test data 
set). Experimentally validated minimotifs are distin- 
guished from predictions by highlighting the minimotifs 
blue in the results table. 

Uses of Minimotif Miner 

The major workflow in minimotif miner is to search a 
single protein query for the presence of minimotifs. This 
is geared toward identifying new functions in proteins, 
minimotif determinants of protein-protein interactions, 
or for matching post-translational modifications with po- 
tential enzymes that catalyze such modifications. Many 
different subsets of minimotifs can be selected by using 
the filtering section of the MnM results page (Figure 2). 
Once a custom filter combination is selected with radio 
buttons, selection of the 'Apply Filter(s)' button will re- 
populate the motif results table with the search results. As 
the filtering approaches have grown, this has now been 
moved from the bottom of the protein sequence menu to 
a separate expandable section on the results page. 

To provide an example of the uses of different filters we 
explore a sample analysis of HIV-1 Nef, (NP_057857). We 
chose Nef because it is a well-studied protein and we could 
evaluate minimotif predictions using HIVToolbox (40). 
Although we expected better filtering from the new algo- 
rithm, only a small portion of known minimotifs have 
been identified and added to MnM, thus we would only 
expect that a subset of predicted minimotifs would have 
been previously experimentally validated. In the old MnM 
2 output, the minimotifs were rank ordered by frequency 
score. This ordering often strongly selects for a high 
ranking of instances, which generally are far more 
information-rich than consensus sequences. The new 
ranking in MnM 3 depends on many types of different 
data and filter testing indicates that the new filters are 
superior to the minimotif ranking used in MnM 2. 

The new default filter ranks 21 minimotif predictions for 
Nef with a score between 0.24 and 0.91 where few false 
positives were observed when a test data set was analyzed 
(Figure 3). This figure shows 19 minimotifs that are 
colored blue indicating support by experiments in the lit- 
erature; we note that two of these minimotifs have scores 
below 0.24 and three do not have scores because of 
missing information. Of the other seven high scoring pre- 
dicted minimotifs, two of these minimotifs were for previ- 
ously known interactions of Nef with the SH3 domain of 
Fyn and with AIP-1; both were annotated and added to 



the MnM database (41,42). One was a for a c-Rafl 
binding motif consensus sequence where there is a 
verified instance. A minimotif was identified for binding 
to the P subunit of API, AP2 and AP3, which was previ- 
ously known and has now been added to MnM; the motif 
predicted to interact with the AP2 and AP3 (i subunit was 
not previously identified (43-46). MnM predicted phos- 
phorylation of Nef by PKCa at three sites: 15, 80 and 
103, none of which were present in MnM. In support of 
these predictions, Nef is known to be phosphorylated at 
Thr 15, which is inhibited by a PKC inhibitor (47,48). We 
note that only 29% of >7000 HIV isolate sequences have 
a Thr in this position of Nef; whereas 98% of these viruses 
have a Ser at position 103 [analysis with HIVToolbox 
(40)]. Ser 103 was suggested to be phosphorylated by 
PKCa in vitro (49). Thr at position 80 was predicted as 
a PKCa site, but no evidence supporting phosphorylation 
of this site by PKCa could be identified in the literature. 
MnM also predicted a novel interactions of Nef with 
the C-terminal SH3 domain of Grb2 and a site that 
binds peptidylprolyl isomerase. In summary, of the 24 
minimotifs identified by the MnM analysis (including 
one with three distinct sites) with scores above a major 
false-positive threshold, 21 had previously been 
demonstrated and we cannot rule out the possibility that 
the other three have not yet been discovered. Although 
Nef is well studied, most proteins have many minimotifs 
predicted with scores above 0.24 that are yet to be 
investigated. 

Some scientists may want to analyze many protein se- 
quences at once. We have now enabled this type of 
workflow as an email service for batch query input 
mode on the MnM input page. The input file for the 
request must contain a list of protein accession numbers 
from one or more various data sources (UniProtKB, 
MIM, RefSeq, Ensemble, UniGene, MIM, PIR, Entrez 
Gene) and/or protein sequences; this format is indicated 
in a hyperlink in this section of the input page. 

Another workflow is identifying minimotifs that play a 
role in human disease, or organism diversity as originally 
reported (27,50). In MnM 2, this was accomplished by 
mapping missense SNPs located in protein coding 
regions from the dbSNP database (4,39). In the View 
menu on the results page, the 'View SNPs' selection 
reveals known SNPs in the Protein Sequence window 
highlighted blue and capitalized. When any SNP is 
clicked, the SNP has a green highlight and the amino 
acid change is shown. Any combination of SNPs can be 
selected. The 'View motifs from New SNPs' found under 
the 'View' menu item will create a new table that identifies 
any minimotifs that are introduced or eliminated by the 
selected SNPs. Since many SNPs are for disease-associated 
mutations, this tool can be used to formulate new 
hypotheses about disease mechanisms. 



DISCUSSION AND CONCLUSIONS 

We have expanded the model for Minimotifs to contain 28 
attributes that offers a number of advantages. Some ad- 
vantages are the segregation of specific attributes are that 
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ANNOTATION 

s instance motif in Nef is myristoylated by NMT1; The Glycine at position 2 in Nef must 
s instance motif in Nef binds C-Raf 1 
s instance motif in Nef binds AP1 mu1 
s instance motif in Nef binds AP2 

s instance motif in Nef binds Peroxisomal acy-CoA thioesterase 1 

s instance motif in Nef binds V1H 

s instance motif in Nef binds the SH3 domain of fyn 

s instance motif in NEF binds the SH3_1 domain of FYN 

s instance motif in NEF binds the SH3_1 domain of FYN 

s instance motif in Nef binds the SH3 domain of Lck 

s instance motif in Nef binds the SH3 domain of Hck 

s instance motif in Nef is phosphoryiated by PKA; The Serine at position 6 in Nef must t 
s instance motif in Nef is phosphoryiated by PKC alpha; The Threonine at position 15 ir 
s instance motif in Nef binds AIP-1 
s instance motif in Nef binds Arf1 
s instance motif in Nef binds GATA-1 

s instance motif in NEF is phosphoryiated by PKCalpha; The Threonine at position 15 it 
s instance motif in NEF is lipidated by N-myristoyl; The Glycine at position 2 in NEF mu 
s instance motif in NEF is modified by an unknown target; 

s instance motif in NEF binds HLA class I histocompatibility antigen. A-11 alpha chain 
s consensus motif binds C-Raf1 

s consensus motif binds the Trunk domain of AP1 ,AP2,AP3 beta subunits 
s consensus motif binds the #2 SH3 domain of Grb2 

s instance motif in Capsid binds the peptidylprolyl isomerase domain of peptidylprolyl is 
s consensus motif binds AP2 

s consensus motif binds AP1 ,AP2,AP3, AP4 mu subunits and is trafficked by endocyto: 
s consensus motif is phosphoryiated by PKCalpha; The Threonine at position 1 must bf 
s consensus motif binds the #3 SH3 domain of CIN85 

s consensus motif binds Sari and is trafficked by Endoplasmic Reticulum Export 
s consensus motif is phosphoryiated by CDK1/CylcinB1 complex; The Threonine at po; 
s consensus motif binds the Chromo Shadow domain of HP-1 
s consensus motif binds the #1 SH3 domain of Crk 

s consensus motif is phosphoryiated by Erk; The Serine at position 3 must be modified 

s consensus motif is proteolyzed by Furin; 

s consensus motif is proteolyzed by Furin; 

s consensus motif binds Caveolin 1 

s consensus motif is proteolyzed by Granzyme B; 

s consensus motif is phosphoryiated by EGFR; The Tyrosine at position 2 must be mod 
s consensus motif binds the #2 SH3 domain of GADS 

s consensus motif is phosphoryiated by Stat3; The Tyrosine at position 1 must be modil 

s instance motif in peptide is phosphoryiated by cFGR; The Tyrosine at position 2 in pe| 

s consensus motif is phosphoryiated by Cam KM; The Serine at position 1 must be modi 

s consensus motif binds the TRAF domain of TRAF2 

s consensus motif binds platelet fibrinogen receptor 

s consensus motif is proteolyzed by Prohormone Convertase 7; 

s consensus motif binds Vif 

s consensus motif binds the G-protien domain of Phosphate 
s consensus motif binds the phosphatase domain of PPIalpha 
s consensus motif is proteolyzed by TTP; 

s instance motif in Chk2 (isoform a) is phosphoryiated by ATM; The Threonine at positic 
s consensus motif is dephosphorylated by TC-PTP; ; The Tyrosine at position 2 must bi 
s consensus motif binds Skp2; The Threonine at position 3 must be modified with a O-r. 
s consensus motif is phosphoryiated by JAK2; The Tyrosine at position 1 must be modi 
s consensus motif is phosphoryiated by Casein Kinase II; The Serine at position 1 musl 
s consensus motif binds the SH2 domain of ZAP70; The Tyrosine at position 1 must be 
s consensus motif binds the #2 FHA domain of Rad53; The Threonine at position 1 mu: 
s consensus motif is phosphoryiated by GSK3 beta; The Threonine at position 1 must b 
s consensus motif is phosphoryiated by Casein Kinase II; The Serine at position 1 musl 
s consensus motif binds the SH3 domain of Src 

s consensus motif binds the #2 FHA domain of Rad53; The Tyrosine at position 1 must 
s instance motif in ELK-1 is phosphoryiated by Erk; The Threonine at position 1 in ELK- 
s consensus motif binds Cdk2 

s consensus motif is phosphoryiated by Casein Kinase I ; The Threonine at position 4 m 
s consensus motif is phosphoryiated by Cam KM; The Threonine at position 4 must be rr 
s consensus motif is phosphoryiated by Erk; The Threonine at position 1 must be modif 
s consensus motif is phosphoryiated by ATM; The Serine at position 1 must be modifiec 
s consensus motif binds the FHA domain of KAPP; The Threonine at position 1 must be 
s consensus motif is phosphoryiated by RSK; The Serine at position 4 must be modifiei 
s consensus motif binds Sari and is trafficked by Endoplasmic Reticulum Export 
s consensus motif binds the SH2 domain of SH2D1 A 

s instance motif in Src is myristoylated by an unknown target; The Glycine at position 2 

s consensus motif is myristoylated by an unknown target; The Glycine at position 2 mu: 

s instance motif in SPS1 binds to ATP; 

s instance motif in CARML is modified by an unknown target; 

s consensus motif binds the SH3 domain of Las17p 

s instance motif in CSG is glycolysated by Glycosyl; The Asparagine at position 643 in i 
s instance motif in unknown protein (Q51451) binds the 14-3-3 domain of 1433B 
s consensus motif binds Iron 

s instance motif binds the SH2 domain of an unknown target 
s instance motif in AIFM1 binds to FAD; 
s instance motif in QORX binds to NADP; 
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Figure 3. Results table in MnM 3 from analysis of HIV- 1 Nef protein. Nef (NP_057857) was analyzed to produce the minimotif predictions shown. 
Column 1 shows the minimotif sequence, column 2 shows the function of the minimotif, column 3 shows the amino acid position(s) for the start 
residue in the minimotif, column 4 shows the combined filter score, and column 5 shows the number of occurrences of each motif in the entire HIV-1 
proteome. Rows colored blue are for minimotifs that are experimentally validated, yellow are above a threshold for high accuracy prediction, and red 
are below this threshold or do not have data to calculate a combined filter score (null). 



Nucleic Acids Research, 2012, Vol. 40, Database issue D259 



it reduces ambiguities and faulty annotations, can be 
readily used to identify missing data, and allows the use 
of many different types of controlled vocabularies. The 
rich model enables easy mining of data through SQL 
queries in a number of ways. For example, MnM allows 
a widget-based custom query builder to mine Query 
Engine MnM database. Using this tool, consensus se- 
quences or position-specific scoring matrices can be 
generated for minimotifs where many different instances 
were studied, often in separate laboratories. In this 
manner, our model that maintains the instance informa- 
tion in its raw form becomes a rich source for automated 
generation of consensus motifs. 

MnM offers a unique resource that is synergistic with 
other minimotif search tools. MnM is a broad-based 
minimotif resource that covers all types of minimotifs 
from any species with now approximately 300000 
minimotifs. A brief comparison with some other motifs 
tools is highlighted, but there are far too many tools to 
present a comprehensive review. The Eukaryotic Linear 
Motif Server is the closest broad-based minimotif 
resource with 170 consensus motifs and 1817 instances 
(6). Phospho-ELM, an associated database that focuses 
on phosphorylation sites has approximately 42000 in- 
stances (5). These tools use a different, but overlapping 
set of approaches to help reduce false positives. Other 
sites such as Scansite and DomPep use position specific 
scoring matrices for predicting new instances, but focus on 
a set of protein binding domains (3,10). MOTIPs can be 
used to search proteomes for minimotifs (9). SLIMSearch 
2.0 and MyHits allow proteome search of user-defined 
motifs (8,51). 

Minimotif Miner 3 is an important improvement over 
MnM 2. The number of minimotif sequences has 
increased two orders of magnitude, vastly improving the 
sensitivity of minimotif search. This large increase in the 
number of potential minimotifs could potentially hinder 
researchers rather than help if not for the aid of filtering 
mechanisms to reduce the number of false positives. The 
new filtering mechanisms recently introduced, based on 
protein-protein interactions, molecular function, cellular 
function, genetic interactions and the combined filter, 
greatly improve the accuracy and specificity of MnM 3 
search results. 



AVAILABILITY 

The MnM database can be accessed through single protein 
or batch queries using the MnM user interface. The entire 
database is not currently available for download, but the 
MnM investigators are open to collaborations that involve 
using the database. 
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